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tlons  to  knowledge  of  results.  Data  were  obtained  from  447  college  students 
randomly  assigned  to  one  of  the  12  experimental  conditions.  The  results  indi¬ 
cated  that  there  were  no  effects  on  ability  estimates  due  to  knowledge  of  re¬ 
sults,  testing  strategy,  or  pacing  of  item  presentation.  Although  average  la¬ 
tencies  were  greater  on  the  stradaptive  tests  than  on  the  conventional  test, 
the  overall  testing  time  was  not  substantially  longer  on  the  adaptive  tests  and 
may  have  been  a  function  of  differences  in  test  difficulty.  Analysis  of  infor¬ 
mation  values  indicated  higher  levels  of  information  on  the  stradaptive  tests 
than  on  the  conventional  test.  There  was  no  statistically  significant  main 
effect  for  any  of  the  three  experimental  conditions  when  test  anxiety  or  test¬ 
taking  motivation  were  the  dependent  variables,  although  there  were  some  sig¬ 
nificant  Interaction  effects.  These  results  indicate  that  testing  conditions 
may  interact  in  a  complex  way  to  determine  psychological  reactions  to  the  test¬ 
ing  env i ronment .  4\The  Interactions  do  suggest,  however,  a  somewhat  consistent 
standardizing  effect  of  KR  on  test  anxiety  and  test-taking  motivation.  This 
standardizing  effect  of  KR  showed  that  approximately  equal  levels  of  motivation 
and  anxiety  were  reported  under  the  various  testing  conditions  when  KR  was  pro¬ 
vided,  but  that  mean  levels  of  these  variables  were  substantially  different 
when  KR  was  not  provided.  Consistent  with  theoretical  expectations,  the  con¬ 
ventional  test  was  perceived  as  being  either  too  easy  or  too  difficult,  whereas 
the  adaptive  tests  were  perceived  more  often  as  being  of  appropriate  difficul¬ 
ty.  The  results  concerning  the  effects  of  KR  on  test  performance,  motivation, 
and  anxiety  found  in  this  study  were  contrary  to  earlier  reported  findings;  and 
differences  in  the  studies  are  delineated.  Recommendations  are  made  concerning 
the  control  of  specific  testing  conditions,  such  as  difficulty  of  the  test  and 
ability  level  of  the  examinee  population,  as  well  as  suggestions  for  the  fur¬ 
ther  analysis  of  the  standardizing  effect  of  KR. 
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Effects  of  Immediate  Feedback  and 
Pacing  of  Item  Presentation  on 
Ability  Test  Performance  and 
Psychological  Reactions  to  Testing 


The  motivation  to  perform  well  on  an  ability  test  has  been  suggested  as  a 
significant  factor  affecting  test  performance  (Cronbach,  1970).  Some  research¬ 
ers  (e.g.,  Bayroff,  1964;  Betz,  1975;  Betz  &  Weiss,  1976b;  Ferguson  &  Hsu,  1971; 
Strang  &  Rust,  1973;  Zontine,  Richards,  &  Strang,  1972)  have  hypothesized  that 
immediate  feedback  or  knowledge  -.f  results  (KR)  may  increase  motivation  to  per¬ 
form  well.  Others  have  suggested  mechanisms  by  which  KR  affects  behavior. 

Locke,  Cartledge,  and  Koeppel  (1968)  have  offered  an  explanation  for  the  way  in 
which  motivation  is  affected  by  KR  through  goal-setting  behavior.  They  have 
indicated  that  KR  may  mediate  test-taking  behavior  if  an  examinee  makes  an  eval¬ 
uation  of  performance  in  response  to  receiving  KR  and  adjusts  his/her  subsequent 
level  of  effort.  In  such  a  process  the  examinee  sets  a  goal,  and  the  intent  to 
achieve  that  goal  alters  test-taking  behavior.  In  this  view,  KR  without  goal¬ 
setting  does  not  influence  test-taking  behavior.  Such  an  explanation  presuppos¬ 
es  that  goals  and  intentions  Influence  behavior. 

Another  way  in  which  KR  is  hypothesized  to  affect  test-taking  behavior  is 
through  increasing  or  decreasing  test  anxiety  (Liebert  &  Morris,  1967;  Morris  & 
Fulmer,  1976).  Negative  or  failure  feedback  is  hypothesized  to  increase  anxie¬ 
ty,  and  positive  feedback  is  hypothesized  to  decrease  anxiety.  Failure  feedback 
would  tend  to  Increase  expectancy  of  poor  performance  and  thus  tend  to  increase 
worry  or  concern  about  test  performance.  It  is  suggested,  however,  that  failure 
feedback  may  have  a  faciLitative  or  motivating  effect  on  low-anxiety  (high-abil¬ 
ity)  students.  Examinees  with  expectations  of  good  performance,  in  general, 
would  be  less  anxious  about  test  performance.  Thus,  according  to  this  conceptu¬ 
alization,  test  anxiety  varies  inversely  with  expectancy  of  performance.  In 
addition,  Liebert  and  Morris  (1967)  and  Morris  and  Fulmer  (1976)  posit  that  test 
anxiety  has  a  detrimental  effect  on  test  performance.  They  also  state  that 
feedback  affects  test  performance  because  of  the  certainty  an  examinee  attaches 
to  judgments  of  performance  level.  Thus,  it  is  not  only  expectancy  of  test  per¬ 
formance  that  is  affected  by  feedback  but  also  certainty.  Two  examinees  with 
the  same  expectancy  of  performance  may  differ  with  respect  to  the  certainty  of 
their  judgment  of  performance  because  one  received  feedback  and  the  other  did 
not.  The  examinee  with  the  greater  certainty,  which  is  insured  by  providing 
accurate  feedback,  will  have  less  anxiety  than  the  examinee  attaching  less  cer¬ 
tainty  to  his/her  judgment  of  test  performance,  even  though  he/she  expects  to  do 
more  poorly . 

Although  two  seemingly  different  mechanisms  have  been  hypothesized  to  ac¬ 
count  for  the  way  In  which  KR  mediates  test-taking  behavior,  the  motivation 
variable  discussed  by  Locke  et  al .  (1968)  has  some  similarity  to  the  anxiety 
variable  proposed  by  Morris  and  Fulmer  (1976).  The  mechanism  that  motivates  an 
examinee  to  try  harder  on  a  test  in  response  to  negative  KR  may  be  termed  fact  l- 


-  2  - 


itating  anxiety  (Betz  &  Weiss,  1976b;  Mandler  6  Sarason,  1952).  Although  Morris 
and  Fulmer  (1976)  stated  that  negative  HR  makes  examinees  more  anxious  and  that 
anxiety  detrimentally  affects  test  performance,  it  Is  possible  that  such  anxiety 
may  serve  to  improve  or  to  facilitate  test  performance.  Certain  groups,  for 
example,  high-ahility  college  students,  may  respond  more  often  to  negative  KR  by 
trying  harder  on  the  test  than  would  lower  ability  students  (Betz  6  Weiss, 
1976a). 

Immediate  feedback  of  test  performance,  although  difficult  to  provide  when 
administering  a  pa pe r-and-penc il  test,  is  a  relatively  simple  procedure  when 
tests  of  ability  or  achievement  are  administered  by  computer.  For  this  reason, 
adaptive  testing  research  has  facilitated  investigation  of  the  effects  KR  might 
have  on  test  performance  and  on  psychological  reactions,  such  as  motivation  and 
anxiety,  toward  testing.  Betz  and  Weiss  (1976a,  1976b)  studied  the  effects  of 
KR  on  high-  and  low-ability  students  taking  either  a  50-item  conventional  test 
or  a  stradaptive  test.  They  found  that  mean  test  performance  as  measured  by 
maximum  likelihood  ability  estimates  was  higher  when  KR  was  provided  than  when 
it  was  not  (Betz  &  Weiss,  1976a).  They  also  found  that  mean  number-correct 
score  was  higher  under  KR  than  under  no-KR  conditions  for  students  taking  a  con¬ 
ventional  (i.e.,  nonadaptive)  test.  Betz  and  Weiss  (1976b)  found  no  main  effect 
due  to  KR  on  measures  of  motivation  and  anxiety  as  assessed  by  a  posttest  ques¬ 
tionnaire  of  Likert-type  items.  However,  a  significant  interaction  between  KR 
md  ability  indicated  that  among  high-ability  students  those  receiving  KR  re¬ 
ported  a  higher  level  of  motivation  than  those  not  receiving  KR,  whereas  low- 
ability  students  receiving  KR  reported  a  lower  level  of  motivation  than  those 
not  receiving  KR. 

Prestwood  and  Weiss  (1978)  studied  the  effects  of  KR  and  test  difficulty  on 
test  performance  and  on  psychological  reactions  to  testing  in  high-ability  stu¬ 
dents.  They  found  that,  similar  to  the  results  reported  by  Betz  and  Weiss 
(1976b),  there  was  no  difference  in  anxiety  due  to  provision  of  KR;  but  among 
high-ability  college  students,  motivation  was  higher  when  KR  was  provided  than 
when  it  was  not.  In  addition,  a  marginally  significant  (p  =  .054)  Interaction 
between  KR  and  test  difficulty  factors  on  maximum  likelihood  ability  estimates 
indicated  that  when  KR  was  provided,  the  mean  ability  estimate  was  highest  on 
the  most  difficult  tests  and  lowest  on  the  least  difficult  tests.  The  mean 
ability  estimate  for  students  in  the  no-KR  conditions  was  highest  on  the  least 
difficult  tests  and  lowest  on  the  most  difficult  tests. 

Purpose 

In  the  studies  presented  above  (Betz  &  Weiss,  1976a,  1976b;  Prestwood  & 
Weiss,  1978),  which  were  designed  to  assess  the  effects  of  KR,  the  provision  of 
KR  was  unfounded  with  pacing  of  item  presentation.  That  is,  the  rate  or  pacing 
of  item  presentation  in  the  KR  condition  differed  from  the  pacing  in  the  no-KR 
condition.  In  those  studies,  tests  in  the  KR  condition  were  essentially  self- 
pared;  after  receiving  appropriate  feedback  following  an  item  response,  students 
typed  the  letter  "P"  (for  proceed)  on  the  terminal  keyboard  and  pressed  the  "Re¬ 
turn"  key  in  order  to  Initiate  the  presentation  of  the  next  test  item.  In  the 
no-KR  condition,  tests  were  computer-paced,  i.e.,  students  not  receiving  KR  were 
automat  leal ly  presented  with  the  next  test  item  immediately  following  their  re¬ 
sponse  t  )  each  Item. 
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The  present  study  was  designed  to  separately  examine  the  effects  of  KR  and 
of  computer-  versus  self-pacing  of  item  presentation  in  order  to  determine  the 
unconfounded  contribution  of  each  to  the  results  reported  in  the  research  cited. 
Since  the  effects  of  KR  and  pacing  might  differ  depending  on  whether  an  adaptive 
or  a  conventional  test  was  administered,  the  study  investigated  the  joint  ef¬ 
fects  of  KR,  pacing  of  item  presentation,  and  type  of  testing  strategy  on  abili¬ 
ty  test  performance  and  on  psychological  reactions  to  testing. 


METHOD 

Procedure 


Subjects 


The  447  subjects  who  participated  in  this  experiment  were  students  drawn 
from  an  introductory  psychology  course  at  the  University  of  Minnesota.  The  stu¬ 
dents  were  volunteers  who  received  points  that  counted  toward  their  final  course 
grade  in  return  for  their  participation  in  the  experiment. 

Test  Administration 

Students  were  assigned  sequentially  to  testing  conditions  in  which  they 
took  a  test  at  an  individual  cathode-ray  terminal  (CRT).  Each  terminal  was  con¬ 
nected  to  a  Hewlett-Packard  real-time  computer  system.  A  test  proctor  was  pre¬ 
sent  in  the  testing  room  to  provide  assistance  to  any  examinee  during  testing. 
Students  were  assured  that  they  could  take  as  much  time  as  necessary  to  complete 
the  test. 

Prior  to  actual  testing,  instructional  screens  explaining  the  operation  of 
the  CRTs  were  displayed.  After  students  reviewed  the  test  instructions  and  re¬ 
sponded  to  a  number  of  identification  and  demographic  questions,  the  experimen¬ 
tal  test  was  administered.  Each  experimental  ability  test  was  composed  of  five- 
alternative  multiple-choice  vocabulary  questions,  which  students  answered  by 
typing  a  number  on  the  CRT  keyboard  that  corresponded  to  the  chosen  alternative. 
Following  the  experimental  test,  examinees  not  receiving  feedback  recorded  their 
reactions  to  the  test  by  responding  to  18  Likert-type  questions.  Students  re¬ 
ceiving  feedback  responded  to  the  same  18  questions  as  well  as  to  8  additional 
questions  concerning  their  reactions  toward  feedback. 

Design 


Independent  Variables 


This  study  analyzed  three  independent  variables  in  a  3  *  2  x  2  completely 
crossed  design.  One  factor  was  ability  test  strategy.  The  three  levels  of  this 
factor  were  (1)  a  50-item  peaked  conventional  test,  (2)  a  variable-length  strad- 
aptive  test  (Weiss,  1973),  and  (3)  a  fixed-length  (50-item)  stradaptlve  test. 

The  second  factor  was  immediate  knowledge  of  results  (KR);  there  were  two  levels 
of  this  factor:  (1)  KR  and  (2)  no-KR.  The  third  factor  was  pacing  of  item  pre¬ 
sentation:  Items  were  either  (1)  computer-paced  or  (2)  self-paced. 

Students  in  the  KR  conditions  were  informed  by  the  computer  immediately 
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after  their  response  to  a  test  question  whether  it  was  correct  or  incorrect;  if 
it  was  incorrect,  the  correct  alternative  was  given.  In  the  KR  computer-paced 
conditions  this  was  followed  by  a  4-second  delay  until  the  next  question  was 
presented.  In  the  no-KR  computer-paced  conditions  there  was  no  delay  between 
item  response  and  presentation  of  the  next  item.  Under  self-pacing  conditions, 
either  with  or  without  KR,  students  could  pace  the  rate  of  item  presentation: 
After  responding  to  an  item,  the  next  item  could  be  presented  by  typing  "P"  (for 
proceed)  and  pressing  the  "Return"  key. 

Dependent  Variables 

Ability  estimates.  A  major  dependent  variable  of  interest  was  test  perfor¬ 
mance  of  the  examinee,  which  was  estimated  in  three  ways.  Performance  on  the 
stradaptive  and  conventional  tests  was  assessed  by  maximum  likelihood  ability 
estimates  computed  for  each  person  by  employing  the  likelihood  equation  for 
Birnbaum's  (1968)  three-parameter  logistic  model.  A  second  ability  measure  was 
proportion-correct  scores,  which  were  computed  for  students  who  took  the  conven¬ 
tional  test.  The  proportion-correct  score,  which  is  the  ratio  of  number  of 
items  answered  correctly  to  total  number  of  items  administered,  is  an  inappro¬ 
priate  measure  of  ability  when  a  test  is  adapted  to  an  individual's  level  of 
ability  (Weiss,  19,’3,  1974).  For  this  reason,  proportion-correct  scores  were 
not  computed  in  =tradaptive  testing  conditions.  The  third  ability  measure,  used 
only  for  the  stradaptive  tests,  was  the  mean  difficulty  correct  score,  which  was 
found  in  previous  stradaptive  testing  research  to  be  a  valid  (Thompson  &  Weiss, 
1980)  and  reliable  (Vale  &  Weiss,  1975a,  1975b)  approach  to  ability  estimation 
in  stradaptive  tests.  The  mean  difficulty  correct  score  was  computed  by  averag¬ 
ing  the  normal  ogive  difficulty  parameters  of  the  items  answered  correctly  on 
the  stradaptive  test  by  each  individual. 

Response  latency.  Mean  latency  of  response  was  calculated  for  each  indi¬ 
vidual.  Measured  in  seconds,  this  value  represents  the  average  time  it  took  an 
individual  to  read  and  respond  to  an  item.  However,  since  the  length  of  each 
item  was  quite  similar,  latency  would  serve  as  a  rough  indication  of  the  "pro¬ 
cessing  time"  required  by  the  individual  to  answer  an  item.  The  mean  latency 
measured  was  mean  latency  over  all  items  administered,  in  order  to  determine 
whether  testing  conditions  affected  processing  time. 

Information.  Information  is  an  index  of  precision  of  measurement  (Bejar  & 
Weiss,  1979;  Bejar,  Weiss,  &  Gialluca,  1977).  Although  information  is  similar 
in  function  to  reliability,  it  differs  in  that  information  values  are  appropri¬ 
ate  in  describing  precision  at  any  level  of  the  trait  continuum.  Thus,  test 
information  can  be  used  to  evaluate  testing  strategies  (e.g.,  Bejar  et  al., 

1977;  Betz  &  Weiss,  1974,  1975;  McBride  &  Weiss,  1976;  Vale,  1975;  Vale  &  Weiss, 
1975b,  1977).  For  example,  testing  strategies  with  high  information  values  over 
all  trait  levels  are  to  be  preferred  to  tests  with  either  low  or  peaked  informa¬ 
tion  curves.  In  this  study  comparisons  between  testing  conditions  were  based  on 
response  pattern  information  values  derived  from  the  second  derivative  of  the 
l og- 1 1 ke  l  i hood  function  evaluated  at  each  individual's  final  ability  estimate 
(Bejar  &  Weiss,  1979).  These  response  pattern  information  values  were  calculat¬ 
ed  for  each  person  in  every  experimental  condition. 

Psychological  reactions.  Psychological  reactions  to  the  testing  conditions 
were  also  of  Interest  in  this  study.  Measures  of  reported  anxiety,  motivation, 
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and  perception  of  test  difficulty  were  obtained  from  Likert-type  items.  In  ad¬ 
dition,  those  students  in  the  KR  conditions  provided  data  on  their  reactions  to 
KR.  Scales  used  to  measure  these  variables  were  those  used  by  Prestwood  and 
Weiss  (1978).  A  total  of  26  questions  were  administered  to  the  students  in  the 
KR  conditions;  those  in  the  no-KR  conditions  responded  to  18  psychological  reac¬ 
tions  questions;  these  questions  were  administered  immediately  following  the 
experimental  tests.  The  questions  and  their  order  of  administration  are  shown 
in  Table  7  (Motivation),  Table  9  (Anxiety),  Table  11  (Difficulty  Perception), 
and  Table  13  (KR  Reaction);  Table  14  shows  eight  additional  questions  adminis¬ 
tered  that  were  not  included  in  these  four  scales. 

Test  Construction 

Item  Pool 


The  item  pool  from  which  the  conventional  and  stradaptive  test  items  were 
drawn  consisted  of  569  five-alternative  multiple-choice  vocabulary  items.  Item 
response  function  (IRF,  or  item  characteristic  curve)  parameter  estimates  were 
obtained  from  samples  of  the  college  student  population,  according  to  the  proce¬ 
dure  described  by  Prestwood  and  Weiss  (1977,  Appendix  A).  Each  item  had  asso¬ 
ciated  with  it  a  normal  ogive  discrimination  (a)  and  difficulty  (b)  parameter 
estimate.  The  "guessing"  (c)  parameter  of  each  of  the  five-alternative  items 
was  assumed  to  be  .20. 

Convent  ional^JTest 

The  peaked  50-item  conventional  test  was  composed  of  items  whose  difficulty 
parameters  centered  around  the  ability  level  of  the  student  population.  Fifty 
items  were  chosen  so  that  the  mean  difficulty  of  the  items  matched  the  estimated 
ability  level  of  students  taking  the  test  and  so  that  normal  ogive  discrimina¬ 
tion  parameters  were  greater  than  or  equal  to  a  =  .40  (see  Appendix  Table  A). 

The  mean  of  the  difficulty  parameters  was  b  =  .02,  although  the  values  varied 
from  b  =  -.355  to  b  =  +.334,  with  a  standard  deviation  of  .20.  The  mean  of  the 
discrimination  values  was  a  =  .88,  with  values  ranging  from  a  =  .407  to  a  =  1.96 
and  a  standard  deviation  of  .35. 

Stradaptive  Tests 

Stradaptive  testing  required  a  stratified  item  pool  (Weiss,  1973)  with 
items  grouped  by  difficulty  ( t> )  parameters  into  nine  nonoverlapping  strata. 
Within  a  stratum,  items  were  arranged  in  descending  order  of  their  discrimina¬ 
tion  (a)  parameter  estimates.  The  number  of  items  in  each  stratum  varied,  rang¬ 
ing  from  16  items  in  Stratum  9  (the  most  difficult  stratum)  to  57  items  in  Stra¬ 
tum  7.  Three  hundred  twenty-five  items  were  selected  from  the  total  item  pool 
so  that  no  item  with  a  discrimination  parameter  estimate  less  than  a  =  .30  was 
included  in  the  stradaptive  item  pool.  Appendix  Table  B  shows  IRF  parameter 
estimates  for  items  in  the  stradaptive  item  pool. 

The  entry  point  or  stratum  level  from  which  the  first  item  was  selected  for 
administration  was  based  on  student-estimated  college  grade-point-average  (GPA) 
level.  Students  with  higher  reported  GPA  received  an  item  from  a  corresponding¬ 
ly  difficult  stratum,  following  the  procedure  used  by  Thompson  and  Weiss  (1980, 
p.  5).  Thereafter,  items  were  selected  according  to  an  "up-one/down-one” 
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branching  procedure.  By  this  method,  correct  answers  resulted  in  branching  to 
the  next  more  difficult  stratum,  whereas  incorrect  answers  routed  a  testee  to  an 
item  at  the  next  easier  stratum  of  items.  At  every  point  during  testing,  the 
most  discriminating  item  of  those  remaining  in  a  given  stratum  that  had  not  al¬ 
ready  been  administered  to  a  given  individual  was  selected  as  the  next  item  to 
be  administered. 

Testing  continued  until  50  items  had  been  administered  in  the  fixed-length 
stradaptive  test,  but  testing  terminated  in  the  variable-length  stradaptive  test 
when  conditions  set  by  the  termination  criterion  were  met.  According  to  this 
criterion,  testing  was  terminated  when  a  stratum  was  identified  at  which  a  stu¬ 
dent  responded  to  a  series  of  items  at  chance  level  or  below.  Chance  level  was 
defined  to  he  the  reciprocal  of  the  number  of  alternatives  in  the  multiple- 
choice  question.  In  this  case,  the  multiple-choice  questions  each  had  five  al¬ 
ternatives,  so  the  chance  level  of  responding  was  set  as  a  proportion  correct  of 
.20  within  a  stratum.  In  order  to  implement  this  condition,  however,  a  minimum 
of  five  questions  within  a  stratum  were  required  to  be  administered  prior  to 
termination.  If  the  termination  criterion  was  not  reached  by  administration  of 
the  75th  item,  testing  was  terminated  at  that  point. 

Data  Analysis 

Ability  Estimates 

Maximum  likelihood.  Maximum  likelihood  ability  estimates  were  calculated 
for  each  examinee.  These  ability  estimates  were  analyzed  using  a  3  /  2  <2  com¬ 
pletely  crossed  analysis  of  variance  in  which  testing  strategy,  KR  condition, 
and  pacing  of  item  presentation  were  independent  variables.  Means  and  standard 
deviations  for  each  experimental  treatment  combination  were  also  computed  for 
this  variable. 

Proportion  correct  ■  The  proportion  of  items  answered  correctly  was  comput¬ 
ed  for  those  students  in  the  conventional  test  conditions.  Means  and  standard 
deviations  of  this  variable  were  calculated  in  all  KR  and  pacing  conditions. 
Proportion-correct  scores  also  served  as  a  dependent  variable  in  a  two-way  anal¬ 
ysis  of  variance  in  which  KR  and  pacing  of  item  presentation  were  independent 
variables  within  the  conventional  test  conditions. 

Mean  difficulty  correct.  Within  the  stradaptive  testing  condition  the  mean 
difficulty  correct  score  was  analyzed  in  a  2  ■  2  x  2  crossed  analysis  of  vari¬ 
ance  in  which  each  fac tor--s t radapt ive  test  condition  (fixed  vs.  variable 
length),  KR  condition,  and  pacing  condition — had  two  levels.  Means  and  standard 
deviations  for  this  dependent  variable  were  also  computed  in  all  experimental 
treatment  combinations. 

Response  Pattern  Information 

Response  pattern  information  was  computed  for  each  individual  at  the  last 
iteration  of  the  maximum  likelihood  scoring  of  the  individual's  test  response 
data  and  served  as  the  dependent  variable  in  a  3  x  2  x  2  analysis  of  variance. 
Means  and  standard  deviations  of  this  variable  were  computed  for  each  combina¬ 
tion  of  experimental  conditions. 


7 


Response  Latency 

Mean  response  latency  across  all  items  administered,  and  scores  on  psycho¬ 
logical  reactions  scales  derived  from  the  factor  analysis,  were  dependent  vari¬ 
ables  in  univariate  322  analyses  of  variance  in  which  testing  strategy, 

KR  condition,  and  pacing  condition  were  independent  variables.  In  addition, 
students  in  the  KR  conditions  yielded  scores  on  a  KR  Reaction  Scale;  these  val¬ 
ues  were  analyzed  in  a  3  x  2  analysis  of  variance.  Means  and  standard  devia¬ 
tions  were  also  computed  on  all  variables  in  the  combined  test  strategy,  pacing, 
and  KR  conditions. 

Psychological  Reactions 

In  order  to  further  examine  students'  reactions  to  t.-s:  ,  ic.  within  experi¬ 
mental  groups,  the  percentages  of  students  who  chose  e.i.-h  t  .•  i  I  tentative 

Lr  each  psychological  reactions  question  were  calculated  !  ir  the  t  >  t .» 1  group  and 
for  each  experimental  group.  Chi-square  tests  of  independence  wer*'  computed  to 
identify  reactions  to  testing  at  the  single  question,  level  wti  i  it  differed  among 
the  experimental  conditions.  Comparisons  of  Item  responses  were  mode  between 
three  pairs  of  experimental  conditions;  KR  versus  no-KR  condition,  conventional 
versus  stradaptive  testing  strategy,  and  self-  versus  computer-paced  condition 
on  non-KR  items.  For  comparisons  Involving  stradaptive  and  conventional  test 
strategies,  data  for  f ixed-length  and  variable-length  stradaptive  testing  strat¬ 
egies  were  combined.  On  KR  Reaction  Scale  items,  comparisons  were  made  between 
testing  strategies  and  pacing  conditions. 

Finally,  to  examine  the  nature  of  the  relationships  among  the  dependent 
variables,  Intercorrelations  were  computed  among  the  dependent  variables,  and 
the  internal  consistency  reliability  of  the  psychological  reactions  scales  was 
determined  by  Cronbach's  alpha. 


RESULTS 


Ability  Fstimates 

Maximum  likelihood  ability  estimates.  Table  1  shows  the  results  of  the 
three-way  analysis  of  variance  In  which  the  effects  of  testing  strategy,  KR,  and 
pacing  of  item  presentation  on  maximum  likelihood  ability  estimates  were  ana¬ 
lyzed.  Also  shown  are  the  means  and  standard  deviations  of  the  maximum  likeli¬ 
hood  ability  estimates  in  each  of  the  experimental  groups  and  the  number  of  sub¬ 
jects  associated  wit*’  them.  As  the  results  of  the  three-way  analysis  of  vari¬ 
ance  show,  there  was  no  significant  effect  on  maximum  likelihood  ability  esti¬ 
mates  due  to  testing  strategy,  KR  condition,  or  pacing  of  item  presentation;  and 
there  were  no  significant  Interactions. 

Proportion-correct  scores.  The  results  of  the  two-way  analysis  of  variance 
that  analyzed  the  effects  KR  and  pacing  condition  had  on  proportion-correct 
scores  obtained  in  the  conventional  testing  condition  are  presented  In  Table  2. 
Also  shown  are  means  and  standard  deviations  of  the  proportion-correct  scores  in 
each  of  tiie  experimental  conditions;  the  analysis  indicated  that  there  was  no 
significant  effect  of  KR  or  pacing  condition  on  proportion-correct  scores,  nor 
was  there  a  significant  interaction. 
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Table  1 

Means  and  Standard  Deviations  of  Maximum  Likelihood  Ability  Estimates 
for  Conventional  and  Stradaptive  Tests  With  and  Without  KR  in 
Computer-  and  Self-Paced  Conditions,  and  Three-Way  ANOVA  Results 


Experimental  Condition _  Combined 


Self-Paced 

Computer- 

Paced 

Conditions 

Test  and  Kit  Condition 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Conventional 

KR 

39 

-.13 

1.22 

39 

-.32 

1.44 

78 

-.23 

1.33 

No-KR 

31 

-.20 

1.34 

32 

-.40 

1.58 

63 

-.30 

1 .46 

S  t  radapt ive : 
KR 

Fixed  Length 

41 

-.18 

1.13 

41 

-.28 

1.01 

82 

-.23 

1.06 

No-KR 

33 

-.51 

.94 

33 

.09 

1.05 

66 

-.21 

1 .04 

St  radapt ive : 
KR 

Variable  Length 

39 

-.01 

.88 

40 

-.09 

1.05 

79 

-.05 

.96 

No-KR 

33 

-.17 

1.04 

33 

-.11 

1.13 

66 

-.14 

1.11 

Combined  Groups 

Conventional 

70 

-.16 

1.26 

71 

-.36 

1.50 

141 

-.26 

1.38 

Stradapt ive 

Fixed  Length 

74 

-.32 

1.06 

74 

-.11 

1.04 

148 

-.22 

1  .05 

Variable 

Lengt  h 

72 

-.08 

.98 

73 

-.10 

1.08 

145 

-.09 

1  .03 

KR 

119 

-.11 

1.08 

120 

-.23 

1.17 

239 

-.17 

1.13 

No-KR 

97 

-.30 

1.13 

98 

-.14 

1.27 

195 

-.22 

1.21 

Total  Group 

216 

-.  19 

1.11 

218 

-.19 

1.22 

434 

-.19 

1.16 

Three-Way 

Analysis 

of  Variance 

Source  of  Variation 

df 

Mean 

Square 

F 

a 

P 

Ma  in  Effects 

4 

.64 

.47 

.760 

Test 

2 

1.14 

.84 

.434 

KR 

l 

.28 

.20 

.654 

Pac  ing 

1 

.00 

.00 

.975 

Two-Way  Interactions 

5 

1.06 

.78 

.568 

Test  ■  KR 

2 

.12 

.09 

.914 

Test  •  Pacing 

2 

1 .48 

1.09 

.338 

KR  ■  Pacing 

1 

2.08 

1.53 

.217 

Three-Way  Interaction 

Test  '  KR  '  Pacing 

2 

1.26 

.923 

.398 

Res idual 

422 

1 .36 

Total 

433 

1.35 

^Probability  of  error  in  rejecting  null  hypothesis. 

Mean  difficulty  correct  scores 

.  Table 

3  shows  the 

three-way 

analysis  of 

variance  and  descriptive  statistics 

when  mean  difficulty 

correct 

scores  were 

computed  for  items  answered  correctly.  Although  the  KR 

x  Pacing 

interaction 

approached  significance  (p  <  .088), 

there  were  no  other 

significant  sources  of 

variance  in  the  data. 


V 


-  9  - 


Table  2 


Means  and  Standard  Deviations  of  Proportion-Correct  Scores 
for  Conventional  Test  With  and  Without  KR  in  Coraputer- 
and  Self-Paced  Conditions,  and  Two-Way  ANOVA  Results 


KR  Condition 

Experimental 

Self-Paced 

Condition 

Computer-Paced 

Combined 

Conditions 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean  SD 

KR 

42 

.54 

.23 

42 

.53 

.21 

84 

.54  .21 

No-KR 

34 

.57 

.24 

35 

.52 

.24 

69 

.55  .24 

Total  Group 

76 

.56 

.23 

77 

.53 

.22 

53 

.54  .23 

Two-Way  Analysis  of  Variance 


Source  of  Variation 

df 

Mean 

Square 

F 

a 

P 

Main  Effects 

2 

.02 

.31 

.735 

KR 

1 

.00 

.06 

.802 

Pac ing 

1 

.03 

.56 

.457 

Two-Way  Interaction 

KR  x  Pacing 

1 

.02 

.31 

.581 

Res  idual 

149 

.05 

Total 

152 

.05 

'Probability  of  error  in  rejecting  null  hypothesis. 


Response  Pattern  Information 

Means  and  standard  deviations  of  response  pattern  information  as  a  function 
of  testing  strategy,  KR,  and  pacing  conditions  are  shown  in  Table  4.  The  re¬ 
sults  of  the  three-way  analysis  of  variance  are  also  shown.  As  Table  4  indi- 
t  cates,  there  was  a  significant  main  effect  for  testing  strategy  and  a  signifi¬ 

cant  Test  •  KR  ■  Pacing  interact  ton,  which  is  plotted  in  Figure  1. 

The  main  effect  for  testing  strategy  indicated  that  mean  response  pattern 
information  was  highest  (8.67)  in  the  fixed-length  stradaptive  testing  condi¬ 
tion,  next  highest  (6.44)  in  the  variable-length  stradaptive  testing  condition, 
and  lowest  (4.20)  in  the  conventional  testing  condition.  Post  hoc  analysis  in¬ 
dicated  that  mean  level  of  observed  information  of  the  conventional  test  was 
significantly  less  (£  <  .01)  than  either  of  the  stradaptive  tests  and  that  the 
fixed-length  stradaptive  test  was  significantly  higher  (p  <  .01)  than  the  varia¬ 
ble-length  stradaptive  test.  These  data  indicate  that  the  conventional  ...  t 
would  have  to  be  103  items  long  in  order  to  obtain  the  same  level  of  informa¬ 
tion/precision  as  did  the  50-item  fixed-length  stradaptive  test,  or  77  items 
long  to  measure  with  the  same  degree  of  precision/information  as  did  the  varia¬ 
ble-length  stradaptive  test  that  had  a  mean  test  length  of  approximately  27 
items  (SD  =  4.4). 

The  three-way  interaction  data  in  Figure  1  show  different  effects  on  mean 
t  information  as  a  function  of  KR  and  pacing  conditions.  KR  and  pacing  conditions 
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Table  3 

Means  and  Standard  Deviations  of  Mean  Difficulty  Correct  Scores 
for  Stradaptlve  Tests  With  and  Without  KR  in  Computer- 
and  Self-Paced  Conditions,  and  Three-Way  ANOVA  Results 


Experimental  Condition  Combined 


Self-Paced 

Computer- 

Paced 

Conditions 

Test  and  KR  Condition 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Stradaptive:  Fixed  Length 

KR 

41 

-.14 

1.15 

41 

-.22 

.96 

82 

-.18 

1.06 

No-KR 

34 

-.51 

1.01 

33 

.15 

1.02 

67 

-.18 

1.07 

Stradaptive:  Variable  Length 

KR 

39 

-.09 

.87 

40 

-.10 

1.09 

79 

-.10 

.98 

No-KR 

33 

-.21 

1.13 

33 

-.15 

1.09 

66 

-.18 

1.10 

Combined  Groups 

Stradapt ive 

Fixed  Length 

75 

-.31 

1.10 

74 

-.05 

1.00 

149 

-.18 

1.06 

Variable  Length 

72 

-.14 

.99 

73 

-.12 

1.08 

145 

-.13 

1.04 

KR 

80 

-.12 

1.02 

81 

.16 

1.02 

161 

-.14 

1.02 

No-KR 

67 

-.36 

1.08 

66 

.01 

1.06 

133 

-.18 

1.08 

Total  Group 

147 

-.23 

1.05 

147 

-.09 

1.04 

294 

-.16 

1.05 

Three-Way 

Analysis  of 

Variance 

Mean 

Source  of  Variation 

df 

Square 

F 

Pa 

Main  Effects 

3 

.58 

.54 

.658 

Test 

1 

.18 

.16 

.689 

KR 

1 

.11 

.10 

.751 

Pacing 

1 

1 .46 

1 

.34 

.249 

Two-Way  Interactions 

3 

1.46 

1 

.34 

.263 

Test  ■  KR 

1 

.14 

.13 

.719 

Test  x  Pacing 

1 

1.07 

.98 

.323 

KR  x  Pacing 

1 

3.20 

2 

.93 

.088 

Three-Way  Interaction 

Test  x  KR  x  Pac 

ing 

2 

2.09 

1 

.91 

.167 

Residual 

286 

1 .09 

Total 

293 

1 .09 

n 

Probability  of  error 

in  re 

:  jecting 

null 

hypothesis . 

had  no  effect  on  the  information  for  the  conventional  test.  For  the  variable- 
length  stradaptlve  test,  slight  differences  in  information  levels  were  observed 
for  the  KR  conditions,  but  pacing  conditions  had  no  differential  effects.  How¬ 
ever,  for  the  fixed-length  stradaptlve  test,  pacing  and  KR  conditions  interacted 
with  respect  to  information.  Highest  mean  information  values  were  observed  for 
the  computer-paced  no-KR  condition,  and  lowest  mean  information  was  observed  for 
the  computer-paced  KR  condition;  mean  information  values  for  the  self-paced  con¬ 
dition  were  intermediate  between  those  for  the  computer-paced  condition,  but  in 
the  self-paced  condition  KR  had  opposite  effects. 
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Table  4 

Means  and  Standard  Deviations  of  Response  Pattern  Information 
for  Conventional  and  Stradaptive  Tests  With  and  Without  KR  in 
Computer-  and  Self-Paced  Conditions,  and  Three-Way  ANOVA  Results 


Experimental  Condition _  Combined 


Self-Paced  Computer-Paced  Conditions 


Test  and  KR  Condition 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Conventional 

KR 

35 

4.21 

2.19 

36 

4.48 

1.92 

71 

4.35 

2.04 

No-KR 

28 

4.01 

2.41 

28 

4.03 

2.38 

56 

4.02 

2.37 

Stradaptive:  Fixed  Length 

KR 

40 

9.18 

2.92 

38 

7.18 

2.93 

78 

8.21 

3.07 

No-KR 

33 

8.22 

2.90 

33 

10.20 

4.71 

66 

9.21 

4.01 

Stradaptive:  Variable  Length 

KR 

38 

5.82 

3.57 

40 

5.65 

4.25 

78 

5.73 

3.91 

No-KR 

31 

6.70 

4.54 

32 

6.18 

5.23 

63 

6 . 44 

4.87 

Combined  Groups 

Conventional 

63 

4.11 

2.25 

64 

4.25 

2.15 

127 

4.20 

2.19 

Stradapt ive 

Fixed  Length 

73 

8.20 

2.91 

63 

8.69 

3.82 

144 

8.67 

3.56 

Variable  Length 

69 

6.26 

4.05 

72 

5.91 

4.74 

141 

6 .44 

4.87 

KR 

113 

6.51 

3.60 

114 

5.79 

3.37 

227 

6.15 

3.50 

No-KR 

92 

6.43 

3-80 

93 

6.96 

5.02 

185 

6.70 

4.45 

Total  Group 

205 

6.47 

3.70 

207 

6.38 

4.19 

412 

6.43 

3.98 

Three-Way  Analysis  of  Variance 


Mean 


Source  of  Variation 

df 

Square 

F 

Pa 

Main  Effects 

4 

349.09 

28.67 

.001 

Test 

2 

681.75 

55.99 

.001 

KR 

1 

24.71 

2.03 

.155 

Pac ing 

l 

1.47 

.12 

.728 

Two-Way  Interactions 

5 

14.53 

1.19 

.311 

Test  x  KR 

2 

16.06 

1.32 

.269 

Test  x  Pacing 

2 

2.12 

.18 

.840 

KR  '  Pa  c  1  ng 

l 

36.68 

3.01 

.083 

Three-Way  Interaction 

Test  x  KR  x  Pacing 

2 

53.26 

4.37 

.013 

Res idual 

400 

12.18 

Total 

411 

15.68 

‘^Probability  of  error  in  rejecting  null  hypothesis. 


Response  Latency 


Means  and  standard  deviations  for  the  average  latency  over  all  items  as  a 
function  of  testing  strategy,  KR,  and  pacing  conditions  are  presented  in  Table 
5.  The  results  of  the  three-way  analysis  of  variance  are  also  shown.  As  can  be 
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figure  l 

Mean  Response  Pattern  Information  as  a  function  of 
Testing  Strategy,  KR,  and  Pacing  Conditions 
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seen  in  Table  5 ,  there  was  a  significant  main  effect  on  mean  latency  for  testing 
strategy.  No  other  main  effects  or  Interactions  were  significant. 

Average  time  over  all  items  for  completion  of  the  items  was  largest  (11.79 
sec.)  In  the  variable-length  stradaptive  test,  smallest  (14.22  sec.)  in  the  con¬ 
ventional  test  condition,  and  intermediate  (14.75  sec.)  in  the  fixed-length 
stradaptive  testing  condition.  Post  hoc  analysis  indicated  that  there  was  a 
significant  difference  (£  <  .01)  In  average  latency  between  the  variable-length 
stradaptive  and  conventional  testing  conditions. 


13 


Table  5 

Means  and  Standard  Deviations  Over  All  Items  of  Average  Response  Latencies 
for  Conventional  and  Stradaptive  Tests  With  and  Without  KR  In 
Computer-  and  SeLf-Paced  Conditions,  and  Three-Way  ANOVA  Results 


Experimental  Condit ion  Combined 

Self-Paced  Computer -Paced  Conditions 


Test  in,l  KR 

Co  ml  1 1 i on 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Convent  Iona  1 

KK 

42 

14.  18 

3.66 

42 

14.56 

5.05 

84 

14.37 

5.33 

No-KR 

34 

1  3.97 

5.67 

35 

14.11 

4.08 

69 

14.04 

4.89 

St  r adapt ive : 
KK 

Fixed  Length 

41 

13.42 

3.72 

41 

14.74 

5.44 

82 

14.08 

4.68 

No-KR 

34 

IS. 43 

S.78 

33 

15.73 

5.14 

67 

15.58 

5.44 

S  t  r adap t  i  ve : 
KR 

Variable  Length 

39 

13.22 

4 . 52 

40 

15.50 

4.63 

79 

15.36 

4.55 

No-KR 

33 

16.73 

6.17 

33 

15.86 

5.72 

66 

16.30 

5.92 

Combined  Groups 

Convent  tonal 

7b 

14.09 

5.63 

77 

14.36 

4.61 

153 

14.22 

5.12 

St  radapt ive 

Fixed  Length 

73 

14.33 

4.84 

74 

15.19 

5.29 

149 

14.75 

5.07 

Var lable 

l,engt  h 

72 

IS.  91 

5.36 

73 

15.66 

5.12 

145 

15.79 

5.22 

KR 

122 

14. 2S 

4.73 

123 

14.93 

5.03 

245 

14.59 

4.88 

No-KR 

101 

13.36 

5.93 

101 

1  5.21 

3.02 

202 

15.29 

5.48 

Total  Group 

2  33 

14.76 

5.32 

224 

15.06 

5.02 

44  7 

14.91 

5.17 

- 

•  -  -  — - 

_  .  - - 

— 

— 

— 

- - - 

- 

— 

Three-Way  Analysis  of  Variance 


Mean 

Source  >t  Variation 

d! 

Sq  ua  re 

F 

J»a 

Main  Effects 

4 

62.52 

2.36 

.053 

Test 

2 

93.07 

3.51 

.031 

KK 

1 

5  3.05 

2.00 

.158 

Pac i ng 

1 

10.09 

.38 

.338 

Two-Way  Interactions 

5 

21.27 

.80 

.549 

Test  •  KR 

2 

32.89 

1.24 

.291 

Test  •  Pacing 

2 

11.22 

.42 

.655 

KR  Pacing 

1 

17.70 

.67 

.415 

Three-Way  Interaction 

Test  ■  KR  •  Pacing 

2 

2.53 

.08 

.919 

Res  i dual 

435 

26.55 

Total 

446 

26.70 

'Probability  of  error  in  rejecting  null  hypothesis. 


Psychological  Reactions 

Motivation.  The  means,  standard  deviations,  and  three-way  analysis  of 
variance  are  presented  In  Table  6  for  reported  motivation  level  as  a  function  of 
testing  strategy,  KR,  and  pacing  conditions.  There  was  no  main  effect  on  re- 


'V 


-Im¬ 


ported  motivation  level  due  to  testing  strategy  or  KR  condition.  There  was, 
however,  a  significant  KR  ■  Pacing  interaction,  which  Is  plotted  in  Figure  2. 

The  figure  shows  that  reported  motivation  was  high  under  computer-paced  condl-  . 

tions  when  KR  was  given,  but  low  under  no-KR  conditions.  In  the  self-paced  con-  ' 

dition,  however,  the  opposite  relationship  was  found.  When  tests  were  self-  I 

paced,  motivation  was  lower  under  KR  than  under  no-KR  conditions. 

Table  6 

Means  and  Standard  Deviations  of  Motivation  Scores  for 
Conventional  and  Stradaptlve  Tests  With  and  Without  KR  in  , 

Computer-  and  Self-Paced  Conditions,  and  Three-way  ANOVA  Results  1 


Experimental  Condition  Combined 


Se_lf-Paced  Computer-Paced  Conditions 


Test  and  KR  Condition 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Convent ional 

KR 

42 

.08 

2.04 

42 

.12 

2.44 

84 

.10 

2.23 

No-KR 

34 

-.09 

2.18 

35 

- .  46 

2.36 

69 

-.28 

2.26 

Stradaptlve:  Fixed  Length 

KR 

41 

-.52 

2.20 

41 

.24 

2.28 

82 

-.14 

2.26 

No-KR 

34 

.  17 

2.05 

33 

-.30 

2.12 

67 

-.06 

2.08 

Stradaptlve:  Variable  Length 

KR 

39 

.22 

1 .66 

40 

-.01 

2.05 

79 

.11 

1 .86 

No-KR 

33 

.  L9 

1.55 

33 

-.78 

2.11 

66 

-.29 

1 .90 

Combined  Croups 

Convent  Iona ! 

76 

.01 

2.09 

77 

-.  14 

2.41 

153 

-.07 

2.25 

St r adapt  1  ve 

Fixed  Length 

73 

-.21 

2.15 

74 

.00 

2.21 

149 

-.10 

2.18 

Va r  1  ab  1  e  I*-  ngt  h 

72 

.21 

1.60 

73 

-.36 

2.09 

145 

-.08 

1 .88 

KK 

122 

-.08 

1.99 

123 

.12 

2.25 

245 

.02 

2.12 

No-KR 

101 

.09 

1.93 

101 

-.51 

2.19 

202 

-.21 

2.08 

Total  Croup 

223 

-.00 

1  .96 

224 

-.lb 

2.24 

447 

-.08 

2.11 

Three-Way  Analysis  of  Variance 


Mean 

Source  o t  Variation 

df 

Sq  ua  re 

F 

£'‘ 

Mu  1  n  E  t  f  ect  s 

4 

2.29 

.52 

.725 

Te  s  t 

2 

.06 

.01 

.987 

KR 

1 

6.04 

1.36 

.244 

Par  1  ng 

1 

3.02 

.68 

.410 

Two-Way  Interact  ions 

5 

6.72 

1.51 

.  184 

Test  ■  KR 

2 

2.58 

.58 

.560 

Test  •  Pacing 

2 

5.50 

1.24 

.291 

KR  ■  Pacing 

1 

17.29 

3.89 

.049 

Three-Way  Interaction 

Test  ■  KR  Pacing 

2 

1.61 

.36 

.696 

Res  Ideal 

435 

4.44 

Total 

446 

4.43 

Probability  of  error  In  rejecting  null  hypothesis. 
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Figure  2 

Mean  Motivation  Scores  as  a  Function  of 
KR  and  Pacing  Conditions 


Computer-Paced 


Sc  1  I -Pa ted 


Figure  2  also  indicates  that  there  was  little  difference  in  motivation  be¬ 
tween  computer-  and  self-paced  conditions  when  KR  was  given.  In  post  hoc  analy¬ 
sis  of  the  mean  motivation  scores,  the  sura  of  squares  of  the  two  pacing  means 
under  KR  conditions  was  compared  in  an  K  ratio  to  the  residual  sum  of  squares 
with  1  and  435  df.  In  a  similar  manner,  the  sura  of  squares  of  the  two  pacing 
means  under  no-KR  conditions  were  compared  to  the  residual  sura  of  squares.  This 
analysis  indicated  that  there  was  a  significant  difference  among  no-KR  means  (F 
=  4.094,  p  £.05)  but  not  among  KR  means  (F  =  .5518).  Thus,  when  students  re¬ 
ceived  feedback  on  test  performance,  the  average  levels  of  motivation  they  re¬ 
ported  were  relatively  high  and  were  similar  regardless  of  the  pacing  of  item 
presentation.  Under  no-KR  conditions,  however,  motivation  varied  greatly  and 
significantly  as  a  function  of  pacing  condition. 

Table  7  shows  the  percentage  of  students  selecting  each  alternative  of  the 
Motivation  Scale  items  in  each  KR  condition,  testing  strategy  (conventional  vs. 
st radapt lve) ,  and  pacing  condition  and  in  the  total  sample,  and  the  results  of 
chi-square  tests  of  Independence  within  experimental  conditions.  In  general, 
students  reported  a  relatively  high  level  of  motivation  as  assessed  by  the  three 
items  that  defined  the  Motivation  Scale.  Approximately  60%  indicated  on  Ques¬ 
tion  6  that  they  "almost  always"  were  careful  to  select  the  best  alternative  to 
a  question.  When  asked  if  they  were  challenged  to  do  well  on  the  test  (Question 
13),  nearly  three-quarters  of  the  students  replied  that  they  were  "fairly  much" 


Table  7 

Response  Percentages  for  Motivation  Questions  as  a  Function  of 
Condition,  Testing  Strategy,  and  Pacing  Condition,  and  for  Total  Group 


Probability  of  error  in  rejecting  null  hypothesis  of  independence,  based  on  chi-square 
analysis. 
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or  "very  much"  challenged.  About  70%  of  the  sample  Indicated  that  they  cared 
"some"  or  "a  lot"  about  how  well  they  did  on  the  experimental  test  (Question 
18).  There  were  no  statistically  significant  differences  between  KR,  testing, 
or  pacing  conditions  on  any  of  the  Motivation  Scale  items. 

Anxiety.  Means  and  standard  deviations  of  reported  anxiety  level  as  a 
function  of  each  test,  KR,  and  pacing  combination  are  presented  in  Table  8. 

Also  shown  is  the  three-way  analysis  of  variance.  The  analysis  indicates  that 
mean  anxiety  scores  yielded  a  significant  three-way  (Test  x  KR  '  Pacing)  inter¬ 
action.  A  diagram  of  the  interaction  is  in  Figure  3.  In  comparison  with  those 
in  the  KR  condition,  students  not  receiving  KR  reported  higher  anxiety  in  taking 
computer-paced  variable-length  stradaptive  and  conventional  tests  and  in  both 
self-paced  stradaptive  tests.  Lower  levels  of  anxiety  in  no-KR  conditions  were 
reported  in  the  computer-paced  conventional  testing  conditions.  Students  re¬ 
ceiving  KR,  however,  reported  about  the  same  level  of  anxiety  regardless  of 
testing  conditions. 

In  post  hoe  analysis  of  the  mean  anxiety  scores,  the  sum  of  squares  of  the 
six  Test  x  Pacing  means  under  KR  conditions  was  compared  in  an  F  ratio  to  the 
residual  sum  of  squares  with  5  and  435  df_,  and  the  sum  of  squares  of  the  six 
Test  <  Pacing  means  under  no-KR  conditions  were  compared  to  the  residual  sum  of 
squares.  This  analysis  showed  that  the  differences  among  the  six  KR  means  were 
not  statistically  significant  (F  =  .339),  whereas  the  differences  among  no-KR 
means  were  statistically  significant  (F  =  2.96,  £  <^  .05).  A  difference  between 
any  pair  of  mean  anxiety  scores  of  1.52  or  greater  was  statistically  significant 
in  the  no-KR  condition.  These  data  show,  therefore,  that  mean  anxiety  scores 
did  not  differ  significantly  in  the  no-KR  condition  as  a  function  of  pacing  con¬ 
ditions  for  either  of  the  stradaptive  tests  but  that  a  significant  difference 
did  occur  as  a  result  of  pacing  (in  the  no-KR  condition)  for  the  conventional 
test.  Thus,  there  was  no  significant  variation  in  mean  anxiety  scores  among 
testing  conditions  when  students  received  KR;  but  when  KR  was  not  provided,  lev¬ 
els  of  anxiety  varied  significantly  as  a  function  of  testing  condition,  with 
significant  differences  occurring  only  for  the  conventional  test  as  a  function 
of  pacing  conditions. 

Table  9  shows  the  percentage  of  students  in  each  experimental  condition  and 
in  the  total  group  who  chose  each  item  alternative  to  the  four  anxiety  items, 
and  the  results  of  chi-square  tests  within  experimental  conditions.  Overall, 
students  reported  a  low  level  of  anxiety.  Approximately  68%  of  the  total  sample 
reported  on  Question  4  that  they  did  not  worry  "at  all"  or  worried  "somewhat" 
during  testing.  When  asked  if  they  were  nervous  while  taking  the  test  (Question 
7),  about  60%  answered  that  they  were  not  nervous  at  all.  Most  students  (45%) 
indicated  on  Question  11  that  they  were  relaxed  during  testing,  but  some  (36%) 
reported  that  they  were  neither  tense  nor  relaxed.  Approximately  92%  of  the 
total  group  expressed  doubt  that  nervousness  prevented  them  from  doing  well  on 
the  test  (Question  16).  The  only  statistically  significant  difference  was  ob¬ 
served  on  Question  11 — between  the  KR  conditions.  Students  in  the  KR  group 
tended  to  report  lower  levels  of  being  "tense"  or  "very  tense"  than  did  those  in 
the  no-KR  group. 

Difficulty  perception.  Means  and  standard  deviations  for  difficulty  per¬ 
ception  scores  as  a  function  of  test,  KR,  and  pacing  conditions  are  presented  in 
Table  10;  also  shown  are  the  results  from  the  three-way  analysis  of  variance. 
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Table  8 

Means  and  Standard  Deviations  of  Anxiety  Scores  for 
Conventional  and  Stradaptlve  Tests  With  and  Without  KR  in 
Computer-  and  Self-Paced  Conditions,  and  Three-way  ANOVA  Results 


Experimental  Condition  Combined 


Self-Paced  Computer-Paced  Conditions 


Test  and  KR  Condition 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Convent ional 
KR 

42 

-3.15 

2.69 

42 

-3.38 

2.76 

84 

-3.26 

2.71 

No-KR 

34 

-4.49 

2.43 

35 

-2.46 

3.23 

69 

-3.46 

3.02 

Stradaptlve: 

KR 

Fixed  Length 

41 

-3.62 

2.98 

41 

-2.99 

3.10 

82 

-3.31 

3.03 

No-KR 

34 

-3.28 

2.55 

33 

-4.19 

2.72 

67 

-3.73 

2.66 

Stradaptive : 
KR 

Variable  Length 

39 

-3.67 

3.07 

40 

-3.26 

2.50 

79 

-3.46 

2.79 

No-KR 

33 

-2.47 

3.40 

33 

-2.98 

3.32 

66 

-2.73 

3.35 

Combined  Groups 

Conventional 

76 

-3.7  5 

2.64 

77 

-2.96 

3.00 

153 

-3.35 

2.85 

Stradapt ive 

Fixed  Length 

75 

-3.47 

2.78 

74 

-3.53 

2.98 

149 

-3.50 

2.87 

Variable 

Length 

72 

-3.12 

3.26 

73 

-3.14 

2.88 

145 

-3.13 

3.07 

KR 

122 

-3  .48 

2.90 

123 

-3.21 

2.78 

245 

-3.34 

2.84 

No-KR 

101 

-3.42 

2.91 

101 

-3.20 

3.16 

202 

-3.31 

3.03 

Total  Croup 

223 

-3.45 

2.90 

224 

-3.21 

2.95 

447 

-3.33 

2.92 

Three-Way  Analysis  of  Variance 


Mean 


Source  of  Variation 

d_f_ 

Square 

F 

Pa 

Main  Effects 

4 

4.25 

.50 

.733 

Test 

2 

5.05 

.60 

.550 

KR 

l 

.12 

.01 

.906 

Pac Ing 

1 

6.69 

.79 

.374 

Two-Way  Interactions 

5 

9.04 

1.07 

.376 

Test  <  KR 

2 

13.82 

1.64 

.196 

Test  x  Pacing 

2 

8.79 

1.04 

.354 

KR  v  Pacing 

1 

.05 

.01 

.940 

Three-Way  Interaction 

Test  x  KR  x  Pacing 

2 

38.86 

4.60 

.011 

Res  idual 

435 

8.44 

Total 

446 

8.55 

'^Probability  of  error 

in  rejecting  null 

hypothesis . 

There  were  no  significant  main  effects  for  test,  KR,  or  pacing  conditions  on 
perception  of  test  difficulty.  The  Test  *  KR  interaction,  however,  approached 
significance  (j>  <  .086).  The  Test  x  KR  interaction  indicated  that  when  KR  was 
provided,  students  taking  the  conventional  test  perceived  it  to  be  less  diffi¬ 
cult  (mean  »  .26)  than  students  not  receiving  KR  (mean  *  1.79).  Essentially 
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Figure  3 

Mean  Anxiety  Scores  as  a  Function  of 
Testing  Strategy,  KR,  and  Pacing  Conditions 

High 


Fixed-Length  Stradaptive 
■Q  Computer-Paced 
Self-Paced 

Variable-Length  Stradaptive 
0““”0  Computer-Paced 
Self-Paced 

Conventional 

(■>••••0  Computer-Paced 
A#»#**A  Self-Paced 


J.  Low 


equal  levels  of  difficulty  perception  were  reported  by  stradaptive  testees  under 

KR  and  no-KR  conditions,  with  mean  levels  of  difficulty  perception  of  .86  and  1 

1.10,  respectively.  1 

Table  11  presents  the  percentages  of  the  experimental  groups  and  of  the 
total  group  selecting  each  response  alternative  on  the  six  Difficulty  Perception 
Scale  Items,  and  the  results  of  the  chi-square  tests  within  experimental  condi¬ 
tions.  In  general,  most  students  felt  that  the  test  items  were  seldom  easy  and  1 

frequently  too  hard  (Question  1)  and  that  the  test  was  too  difficult  in  relation  J 

to  their  vocabulary  ability  (Question  2).  Most  (54%)  felt  "somewhat"  frustrated 
by  the  difficulty  of  the  test  questions  (Question  12).  The  chi-square  analyses 
show  that  on  every  Difficulty  Perception  Scale  item,  there  was  a  significant 


Table  9 

Response  Percentages  for  Anxiety  Questions  as  a  Function  of  KR  Condition 
Testing  Strategy,  and  Pacing  Condition,  and  for  Total  Group 
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Table  10 

Means  and  Standard  Deviations  of  Difficulty  Perception  Scores 
tor  Conventional  and  Stradaptive  Tests  With  and  Without  KR  in 
Computer-  and  Self-Paced  Conditions,  and  Three-Way  ANOVA  Results 


Experimental  Condition  Combined 


Self-Paced  Computer-Paced  C  ond i t ions 


Test  and  KR  Condition 

N 

Mean 

SD 

N 

Me  a  n 

SD 

N 

Mean 

SD 

Convent ional 

KR 

42 

.04 

4 . 66 

42 

.47 

4.56 

84 

.26 

4.58 

No-KR 

34 

.88 

4.99 

35 

2.69 

4.75 

69 

1.79 

4.92 

Stradaptive:  Fixed  Length 

KR 

41 

.69 

2.89 

41 

1.32 

2.78 

82 

1.01 

2.83 

No-KR 

34 

.62 

3.21 

33 

1.11 

3.53 

67 

.86 

3.36 

Stradaptive:  Variable  Length 

KR 

39 

1.40 

3.23 

40 

.81 

3.17 

79 

1.10 

3.19 

No-KR 

33 

.44 

3.42 

33 

1.40 

3.45 

66 

.92 

3.44 

Combined  Groups 

Conventional 

76 

.41 

4.79 

77 

1.48 

4.75 

153 

.95 

4.78 

St  radapt  ive 

Fixed  Length 

75 

.66 

3.02 

74 

1.22 

3.12 

149 

.94 

3.07 

Variable  Length 

72 

.96 

3.33 

73 

1.07 

3.29 

145 

1.02 

3.3 

KR 

122 

.69 

3.70 

123 

.86 

3.58 

24  5 

.78 

3.64 

No-KR 

>01 

.65 

3.92 

101 

1.75 

3.99 

202 

1.20 

3.99 

Total  Group 

223 

.67 

3.80 

224 

1.26 

3.79 

447 

.97 

3.80 

Three-Way 

Analysis 

of  Variance 

Source  of  Variation 

df 

Mean 

Sq  uare 

F 

Pn 

Main  Effects 

4 

14.74 

1 .02 

.394 

Te  s  t 

2 

.23 

.02 

.984 

KR 

1 

19.67 

1.37 

.243 

Pac i ng 

i 

38.89 

2.70 

.101 

Two-Way  Interactions 

5 

22.30 

1.55 

.173 

Test  ■  KR 

2 

35.39 

2.46 

.086 

Test  ■  Pacing 

2 

8.31 

.58 

.561 

KR  ■  Pacing 

l 

23.73 

1  .65 

.200 

Three-Way  Interaction 

Test  ■  KR  ■  Pacing 

2 

7.91 

.55 

.577 

Res  idual 

4)5 

14.38 

Total 

4  *4  6 

14.44 

’Probability  of  error  in  rejecting  null  hypothesis. 


difference  between  conventional  and  stradaptive  testing  conditions.  Inspection 
of  the  distribution  of  percentages  indicates  that  students  in  the  stradaptive 
testing  conditions  perceived  their  test  as  being  of  more  appropriate  difficulty 
than  students  taking  the  conventional  test.  The  conventional  test  was  more 
often  perceived  as  being  either  too  easy  or  too  hard  by  the  examinees.  There 


Table  11 

Response  Percentages  for  Difficulty  Perception  Questions  as  a  Function  of 
KR  Condition,  Testing  Strategy,  and  Pacing  Condition,  and  for  Total  Group 
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Table  11,  continued 

Response  Percentages  for  Difficulty  Perception  Questions  as  a  Function  of 
KR  Condition,  Testing  Strategy,  and  Pacing  Condi  ton,  and  for  Total  Group 
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were  no  significant  differences  in  item  response  distributions  within  the  KR  or 
pacing  conditions. 

KR  Reaction.  Table  12  shows  the  means,  standard  deviations,  and  two-way 
analysis  of  variance  for  KR  reaction  as  a  function  of  pacing  and  test  strategy 
conditions.  The  table  indicates  no  significant  main  effects  or  interactions 
among  these  variables  for  KR  reaction. 

Table  12 

Means  and  Standard  Deviations  of  KR  Reaction  Scores 
for  Conventional  and  Stradaptive  Tests  in  Computer- 
and  Self-Paced  Conditions,  and  Two-Way  ANOVA  Results 


Experimental  Condition _  Combined 


Self-Paced  Computer-Paced  Conditions 


|  Test 

N 

Mean 

SD 

N 

Mean 

SD 

N 

Mean 

SD 

Convent iona 1 

42 

-.73 

1.59 

42 

-.45 

1.03 

84 

-.59 

1.34 

St  r adapt i ve 

Fixed  Length 

41 

-.81 

1.70 

41 

-.63 

.96 

82 

-.72 

1.38 

Variable  Length 

39 

-.46 

.98 

40 

-.48 

1.42 

79 

-.52 

1.22 

Total  Group 

122 

-.67 

1.46 

123 

-.55 

1.14 

245 

-.61 

1.31 

Three-Way  Analysis  of  Variance 


Mean 


Source  of  Variation 

df 

Square 

F 

P3 

Main  Effects 

3 

.85 

.49 

.689 

Test 

2 

.83 

00 

.621 

Pan ing 

Two-Way  Interaction 

1 

.89 

.51 

.474 

Test  ■  Pa  e i ng 

2 

.85 

.49 

.614 

Residual 

239 

1.73 

Total 

244 

1.72 

'Probability  of  error 

in  rejecting 

nul  1 

hypothesis . 

Items  assessing  the  reactions  to  feedback  were  administered  to  students  in 
the  KR  condition.  Table  13  gives  the  percentage  of  students  in  the  KR  condition 
selecting  each  alternative  of  the  five  KR  Reaction  Scale  questions.  Overall, 
the  reaction  to  feedback  was  very  favorable.  Approximately  80%  of  the  KR-condi- 
tion  students  indicated  that  feedback  made  testing  much  more  interesting  (Ques¬ 
tion  19)  and  that  they  were  interested  in  knowing  whether  their  answers  were 
right  or  wrong  (Question  24).  About  80%  of  the  students  indicated  that  feedback 
did  not  Interfere  with  their  ability  to  concentrate  on  the  test  nor  make  them 
nervous  (Question  20).  About  93%  of  the  KR-condition  students  said  they  liked 
getting  the  feedback  (Question  2b).  Chi-square  analysis  of  the  KR  Reaction 
Scale  questions  Indicated  that  of  the  students  receiving  KR,  students  in  the 
computer-paced  condition  were  more  often  "very  Interested"  in  knowing  whether 
their  answers  were  right  or  wrong  (Question  24).  There  were  no  other  signifi¬ 
cant  differences  in  response  distributions  within  the  experimental  conditions. 


Table  13 

Response  Percentages  of  KR  Reaction  Items  as  a  Function  of 
Test  and  Pacing  Conditions,  and  for  Total  Group 
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Other  psychological  reactions  questions ■  The  eight  questions  that  were  not 
included  In  the  other  four  psycho  logical  reactions  scales  are  shown  in  Table  14. 
Also  shown  are  the  percentage  of  students  in  the  KR,  testing  strategy,  and  pac¬ 
ing  conditions  as  well  as  In  the  total  group  who  selected  each  alternative  of 
the  items,  and  the  results  of  the  chi-square  tests  within  experimental  condi¬ 
tions.  The  chi-square  results  for  the  non-KR  items  were  based  on  data  from  446 
students,  and  the  chi-square  analyses  for  the  KR  items  were  based  on  data  from 
152  students. 

Three  items  showed  significant  differences  for  the  KR  conditions,  one  was 
significant  for  testing  strategies,  but  no  significant  chi-squares  were  observed 
between  pacing  conditions.  More  students  (25.5%)  taking  the  conventional  test 
thought  that  the  difficulty  of  the  test  was  "seldom"  or  "never"  right  for  some¬ 
one  of  their  ability  (Question  3),  but  only  12.9%  of  the  students  taking  the 

stradaptive  test  responded  in  these  categories.  More  students  (20%)  receiving 
KR  felt  that  they  could  have  done  better  on  the  test  if  they  had  tried  harder 

(Question  9)  than  students  not  receiving  KR  (10.9%).  When  students  receiving  KR 

responded  tv>  a  question  for  which  they  didn't  have  an  answer  (Question  14), 

90.2%  said  that  they  chose  the  most  reasonable  choice,  whereas  only  70.3%  of 
students  not  receiving  KR  responded  this  way;  more  students  in  the  no-KR  condi¬ 
tion  (28.7%)  answered  such  an  item  with  the  question  mark  key  than  did  students 
who  received  KR  (9.0%).  Feedback  also  made  students  think  that  they  did  better 
on  the  test  than  students  not  receiving  KR  (Question  15). 

Intercorrelations  among  Dependent  Var i a bles  and  Reliabilities 

Table  15  shows  the  intercorrelations,  levels  of  significance,  and  the  num¬ 
bers  of  students  on  which  the  correlations  were  based;  internal  consistency  re¬ 
liabilities  are  also  shown  for  the  four  psychological  reactions  scales.  Some 
variables  were  measured  only  under  certain  conditions,  and  for  this  reason  cor¬ 
relations  were  based  on  differing  numbers  of  subjects.  Reaction  to  KR,  for  ex¬ 
ample,  was  obtained  only  from  those  in  the  KR  condition.  Since  no  students  were 
administered  both  a  stradaptive  and  a  conventional  test,  there  are  no  correla¬ 
tions  between  conventional  test  proportion-correct  scores  and  stradaptive  test 
mean  difficulty  correct  scores. 

In  general,  the  results  Indicate  that  ability  estimates  correlated  posi¬ 
tively  with  scores  on  the  Motivation  Scale  and  negatively  witli  the  scores  on  the 
Anxiety  and  Difficulty  Perception  Scales.  For  example,  the  high-ability  examin¬ 
ee  reported  higher  levels  of  motivation,  lower  levels  of  anxiety,  and  perceived 
the  test  to  be  less  difficult  than  the  student  of  lower  ability.  Response  pat¬ 
tern  Information,  a  measure  of  score  precision,  correlated  positively  with  all 
ability  estimates  (ir  =  .46  to  .55),  indicating  that  lower  ability  scores  were 
less  precise  and  that  higher  ability  scores  more  precise.  Reported  motivation 
tended  to  correlate  positively  with  ability  estimates  (r  =  .18  to  .37),  whereas 
reported  anxiety  had  low  positive  correlations  with  ability  estimates  (r  =  -.11 
to  -.16).  That  students  were  able  to  accurately  perceive  the  difficulty  of  the 
conventional  test  is  reflected  In  the  correlation  of  -.77  between  proportion- 
correct  scores  and  the  scores  on  the  Difficulty  Perception  Scale. 

Correlations  among  the  psychological  reactions  scales  showed  that  persons 
who  perceived  the  test  to  be  difficult  also  tended  to  have  high  levels  of  re¬ 
ported  unxLety  (r  =  .25).  Reported  motivation  had  a  slight  negative  correlation 
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with  scores  on  the  Difficulty  Perception  Scale.  There  was  a  slight  but  signifi¬ 
cant  positive  relationship  (r  =  .13)  between  reported  anxiety  and  motivation 
level.  Reported  anxiety  also  had  a  moderate  negative  correlation  (r  =  -.40) 
with  reaction  to  feedback.  That  is,  the  higher  the  level  of  anxiety,  the  less 
students  liked  getting  feedback. 

Table  13  also  shows  that  the  Difficulty  Perception  Scale,  composed  of  six 
items,  had  an  alpha  coefficient  of  .84,  whereas  the  four-item  Anxiety  Scale  had 
a  reliability  of  .73.  Both  the  Motivation  Scale,  composed  of  three  items  and 
the  five-item  KR  Reaction  Scale  had  alpha  reliabilities  of  .61. 


DISCUSSION  AND  CONCLUSIONS 

In  prior  research  that  investigated  the  use  of  feedback  in  adaptive  testing 
(Bet/  4  Weiss,  1976a,  l97ob;  Prestwood  &  Weiss,  1978),  provision  of  feedback  was 
confounded  with  rate  of  item  presentation.  The  present  research  dealt  with  the 
unconf ounded  effects  of  KR  and  pacing  as  well  as  testing  strategy  on  test  per¬ 
formance  and  on  psychological  reactions  to  testing,  including  test  anxiety  and 
inot  ivat  ion. 

Results  that  indicate  the  effect  feedback  had  on  test  performance  in  adap¬ 
tive  testing  have  varied.  The  present  research  and  a  prior  study  (Prestwood  6 
Weiss,  1978)  did  not  replicate  the  finding  that  test  performance  was  higher 
under  KR  than  under  no-KR  conditions  (Betz  &  Weiss,  1976a).  The  original  study 
(Betz  4  Weiss,  1976a)  differed  from  subsequent  studies  In  that  ability  estimates 
were  based  on  a  combined  group  composed  of  low-  and  htgh-ability  college  stu¬ 
dents.  The  present  research  and  the  Prestwood  and  Weiss  (1978)  study,  however, 
based  ability  estimates  on  groups  composed  only  of  high-abllity  college  stu¬ 
dents.  A  significant  increase  in  mean  test  performance  under  KR  conditions  has 
only  been  demonstrated  when  a  relatively  large  range  of  college  ability  was 
tested  . 

The  time  a  student  spent  solving  each  item  was  based  on  Item  latencies  av¬ 
eraged  across  all  Items.  These  average  Item  latencies  were  found  to  differ  by 
type  of  testing  strategy.  Students  took  a  significantly  longer  time  (an  average 
of  1.5  seconds  per  Item)  to  solve  items  on  the  variable-length  stradaptive  test 
than  they  took  on  the  conventional  test.  However,  this  difference  may  be  due  to 
variations  in  Item  difficulty  on  these  tests.  The  mean  difficulty  of  the  con¬ 
ventional  test  was  b  =  .02,  whereas  the  mean  difficulties  of  the  items  adminis¬ 
tered  in  the  fixed-length  and  var labl e- length  stradaptive  tests  were  b  =  .16  and 
b  =  .26,  respectively.  Thus,  as  might  be  expected,  students  taking  the  more 
difficult  variable-length  stradaptive  test  took  longer,  on  the  average,  to  re¬ 
spond  t  >  an  Item  than  those  students  taking  the  tests  composed  of  easier  Items. 

The  longer  latency  of  the  stradaptive  test  was  also  found  hv  Waters  (1977) 
in  comparison  to  a  peakpd  conventional  test  hut  was  not  found  by  Betz  (197ba). 
The  difference  In  the  findings  was  due  to  the  difficulty  of  the  conventional 
test  In  comparison  to  the  stradaptive  test  for  the  particular  ability  of  the 
groups  tested.  The  conventional  test  used  by  Waters  (1977)  was  easier  In  com¬ 
parison  to  the  stradaptive  test  than  was  the  conventional  test  employed  by  Betz 
(1976a)  for  their  respective  groups  of  examinees.  Tlius,  the  latency  results 
depend  oil  the  particular  conventional  test  employed  and  also  on  the  ability 


-  31 


level  of  the  group  to  which  it  is  administered.  Generally,  however,  the  adap¬ 
tive  test  wi L 1  tend  to  administer  items  very  near  to  the  ability  level  of  the 
examinee  so  that  In  comparison  to  the  conventional  test,  this  may  be  a  more  or 
less  difficult  test  for  a  given  set  of  examinees.  Although  item  latency  data 
showed  significant  differences  between  stradaptive  and  conventional  tests  in  the 
present  research,  in  practical  terms  the  difference  in  mean  total  testing  time 
for  a  40-ltem  test  would  be  approximately  1  minute. 

Although  the  stradaptive  tests  took  slightly  longer  to  administer,  the  re¬ 
sponse  pattern  information  data  showed  that  they  provided  substantially  more 
information  than  did  the  conventional  tests.  With  equal  test  length  to  that  of 
the  conventional  tests,  the  fixed-length  stradaptive  test  provided  measurements 
that  were  more  than  twice  as  precise  as  those  of  the  conventional  test,  on  the 
average.  This  result  can  be  translated  directly  into  test  length  savings  of 
more  than  30%  to  attain  levels  of  precision  equal  to  those  of  the  conventional 
test.  Even  more  precise  measurements  were  obtained  by  the  variable-length 
stradaptive  test,  which  obtained  measurements  with  mean  information  more  than 
twice  that  of  the  conventional  test,  while  administering  almost  50%  fewer  items. 
Ttiis  indicates  that  a  variable-length  stradaptive  test  would  require  only  about 
17.5  items  to  achieve  the  same  average  level  of  measurement  precision  as  the 
40-item  peaked  conventional  test.  These  results  are  consistent  with  both  earli¬ 
er  1 i ve  testing  and  simulation  studies  demonstrating  the  measurement  superiority 
of  the  variable-length  stradaptive  test  (e.g.,  Thompson,  1980;  Vale,  1977;  Vale 
>4  Weiss,  197  5a,  197  5b). 

In  a  number  of  studies  (Betz  &  Weiss,  1976b;  Pine,  Church,  Gialluca  & 

Weiss,  1979;  Prestwood  6  Weiss,  1978),  there  has  been  no  effect  on  test  anxiety 
due  to  feedback.  Similarly,  in  the  present  study,  there  was  no  decrease  in  mean 
anxiety  as  assessed  by  the  Anxiety  Scale  when  students  received  feedback.  It 
may  be  that  the  volunteer  experimental  subjects  did  not  have  enough  Interest  to 
perform  well  on  the  test  to  become  tes t-anxiou s .  College  students  taking  an 
experimental  test  may  have  a  low  level  of  test-taking  arousal.  The  effects  of 
feedback  in  a  motivated  context  (for  example,  the  classroom)  on  anxiety  may  dif- 
ter  from  that  found  in  an  experimental  setting;  or  it  may  be  that  test  anxiety 
is  a  stable  expectation  of  performance  for  a  person  who  is  fairly  resistant  to 
testing  conditions,  such  as  type  of  test  or  admi nistrat ion  of  feedback.  Perfor¬ 
mance  and  anxiety  level  could  possibly  be  affected  by  altering  the  quality  of 
feedback,  i.e.,  by  using  a  relatively  easier  test  adapted  to  the  individual's 
ability.  All  students  would  receive  a  relatively  easier  test,  and  the  positive 
feedback  could  lead  to  better  test  performance  and  lowered  anxiety.  It  may  be 
that  students  should  be  grouped  Into  high-  and  low-anxiety  groups--  if  anxiety 
is  a  stable  person  cha rae te r i s t  lc — as  well  as  high  and  low  ability  groups;  and 
f he  effects  of  feedback  under  high  and  low  difficulty  testing  conditions,  feed- 
hack  conditions,  and  conventional  vs.  adaptive  conditions  should  be  examined. 
Adaptive  testing  may  have  its  own  motivating  effect,  since  through  subjective  Kg 
students  may  perceive  the  difficulty  of  the  test  differently  from  conventional 
tests. 

The  significant  three-way  KR  •  Pacing  ■  Testing  Strategy  interaction  indi¬ 
cates  that  anxiety  interacts  In  a  complex  way  with  testing  conditions.  It  is 
interesting,  however,  that  when  KR  was  provided,  the  reported  level  of  anxiety 
ill  not  vary  significantly  as  a  function  of  testing  condition,  i.e.,  students 
receiving  KR  reported  about  the  same  level  of  anxiety  in  each  of  the  testing 
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conditions.  When  feedback  on  performance  was  withheld,  students  reported  anxie¬ 
ty  levels  that  varied  significantly  as  a  function  of  testing  condition,  with 
significant  differences  occuring  only  for  the  conventional  test.  These  data 
suggest  that  feedback  may  standardize  testing  conditions  with  respect  to  test¬ 
taking  anxiety  and  that  under  no  feedback  conditions,  students'  motivational 
reactions  to  conventional  tests  are  more  susceptible  to  the  influence  of  test 
administration  conditions  than  are  their  motivational  reactions  to  adaptive 
tests . 


Research  has  consistently  shown  that,  unlike  anxiety,  reported  motivation 
varies  with  the  provision  of  feedback  (Betz  &  Weiss,  1976b;  Pine,  Church,  Gial- 
luca,  &  Weiss,  1979;  Prestwood  &  Weiss,  1978).  Betz  and  Weiss  (1976b)  found  a 
significant  Ability  Group  KR  interaction,  which  indicated  differences  in  moti¬ 
vation  between  low-ability  and  high-ability  students  attributable  to  the  provi¬ 
sion  of  feedback.  In  the  high-ability  group,  motivation  was  higher  under  KR 
than  under  no-KR  conditions;  whereas  in  the  low-ability  group,  motivation  was 
lower  under  the  KR  than  under  the  no-KR  conditions.  Postulating  that  motivation 
increased  when  the  proportion  of  positive  feedback  increased,  Prestwood  and 
Weiss  (1978)  studied  the  joint  effects  of  provision  of  KR,  test  difficulty  (pro¬ 
portion  of  positive  feedback),  and  testing  strategy  on  test  performance  and  on 
test-taking  reactions  with  students  of  high  ability.  The  examinees  reported 
higher  motivation  when  KR  was  provided  than  when  it  was  not.  Thus,  higher  abil¬ 
ity  students  in  both  studies  reported  higher  motivation  when  KR  was  provided. 

In  the  Prestwood  and  Weiss  (1978)  study  there  was  no  effect  of  test  diffi¬ 
culty  on  reported  motivation  nor  was  any  interaction  significant.  This  would 
indicate  that  it  is  not  merely  the  quality  of  feedback  (positive  or  negative) 
that  determines  motivation  to  perform  well  but  the  examinee's  reaction  to  feed¬ 
back.  Positive  or  negative  reaction  to  feedback  may  be  determined  in  part  by  a 
history  of  academic  successes  or  failures.  In  other  words,  two  examinees  may 
receive  the  same  amount  of  negative  feedback  but  will  react  differently  because 
of  differences  in  academic  history.  This  conclusion  was  partially  supported  by 
a  finding  reported  in  Pine,  Church,  Gialluca,  and  Weiss  (1979)  that  Black  exam¬ 
inees  were  less  motivated  under  KR  conditions  than  White  examinees,  who  were 
more  motivated  under  KR  conditions.  Although  the  study  did  not  control  for  pro¬ 
portion  of  positive  feedback,  it  may  be  that  Black  students  reacted  less  favora¬ 
bly  to  feedback  than  White  students  due  to  differences  in  academic  history. 

Although  the  present  study  did  not  replicate  the  positive  relationship  be¬ 
tween  KR  and  reported  test-taking  motivation,  it  did  indicate  that  motivation 
level  varied  as  a  result  of  the  interaction  of  feedback  with  pacing  of  item  pre¬ 
sentation.  The  interaction  indicated  that  highest  motivation  was  reported  under 
computer-paced  KR  conditions,  whereas  lowest  motivation  was  reported  under  com¬ 
puter-paced  no-KR  conditions.  Such  a  finding  indicates  that  the  effect  of  KR  on 
motivation  is  modified  by  other  testing  conditions  with  which  feedback  is 
paired.  Differences  in  empirical  results  dealing  with  the  effect  feedback  has 
on  test-taking  motivation  may  stem  from  variation  in  testing  conditions — such  as 
paring  of  item  administration,  test  difficulty,  and  testing  strategy--wi th  which 
feedback  has  been  paired.  As  with  the  anxiety  interaction,  there  was  evident  in 
the  motivation  interaction  the  possible  standardizing  effect  of  feedback.  Lev¬ 
els  of  reported  motivation  were  not  significantly  different  between  computer- 
and  self-paced  conditions  when  feedback  was  provided,  but  statistically  9ignifl- 
rant  differences  were  observed  between  pacing  conditions  when  feedback  was  not 
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provided.  Unlike  the  Motivation  Scale  scores,  however,  the  significant  differ¬ 
ences  for  the  Anxiety  Scale  scores  did  not  occur  only  on  the  conventional  tests. 

Cht-square  analysis  of  post-questionnaire  attitude-assessment  items  indi¬ 
cated  that  feedback  and  testing  strategies  tend  to  affect  student  reactions  to 
the  testing  environment  in  different  ways.  As  there  were  no  differences  among 
patterns  of  responding  to  the  psychological  reactions  items  between  self-  and 
computer-pacing  conditions,  this  would  Indicate  that  pacing,  as  it  was  defined 
in  this  study,  may  not  be  an  Important  testing  variable  affecting  psychological 
reactions.  Feedback,  on  the  other  hand,  appeared  to  decrease  the  reported  ten¬ 
sion  level  of  the  examinee,  to  foster  attempts  of  the  student  to  try  harder  on 
the  test,  and  to  respond  to  each  Item  with  the  most  appropriate  answer.  Testing 
strategy  had  no  measurable  effect  on  the  motivation  or  anxiety  state  of  the  stu¬ 
dent;  but  because  the  testing  strategies  differed  in  difficulty,  it  did  affect 
students'  perceptions  of  test  difficulty.  In  general,  these  differences  reflect 
the  correct  perception  that  the  stradaptlve  tests  were  tailored  to  the  ability 
level  of  the  individual,  whereas  the  conventional  test  was  peaked  at  the  ability 
level  of  the  student  population.  Thus,  the  conventional  test  was  more  often 
perceived  to  be  too  difficult  or  too  easy  in  comparison  to  the  stradaptive  test. 
Chi-square  results  presented  In  Table  13  show  that  patterns  of  responding  dif¬ 
fered  between  students  In  conventional  and  stradaptive  test  strategies  on  all 
Difficulty  Perception  Scale  items.  In  general,  students  taking  the  stradaptive 
test  said  that  they  thought  the  test  was  a  little  difficult  for  someone  of  their 
ability  and  that  they  were  somewhat  frustrated  by  the  test  difficulty.  More 
students  In  the  conventional  test  correctly  responded  in  the  extreme  categories 
of  the  difficulty  perception  items,  indicating  that  the  test  was  perceived  as 
being  more  often  too  hard  or  too  easy. 

Cone luslons 

The  standardizing  effect  feedback  has  both  on  psychological  reactions  to 
testing  and  on  test  performance  was  a  finding  that  occurred  repeatedly  in  this 
study  and  one  that  should  be  investigated  further.  This  standardizing  effect 
occurred  with  the  Motivation  Scale  in  a  ICR  x  Pacing  interaction  and  on  the  Anxi¬ 
ety  Scale  in  the  KR  ■  Pfl''inJ  x  Te-ting  Strategy  interaction.  In  the  former  in¬ 
teraction,  levels  of  motivation  were  more  similar  between  computer-  and  self- 
paced  conditions  when  feedback  was  provided  than  when  it  was  not.  Even  more 
striking  was  the  lack  of  variation  in  mean  Anxiety  Scale  scores  across  experi¬ 
mental  conditions  when  feedback  was  provided.  Students  in  the  six  experimental 
conditions,  which  derived  from  combinations  of  the  three  testing  strategies  and 
two  pacing  conditions,  indicated  that  anxiety  level  varied  widely  when  feedback 
was  not  provided.  Research  is  indicated  to  detect  if  feedback  has  such  a  stand¬ 
ardizing  effect  when  combined  with  other  experimental  testing  treatments.  Fur¬ 
thermore,  such  research  should  deal  directly  with  the  apparent  standardizing 
effect  of  KR. 

The  present  study,  using  high-ability  students  and  the  experimental  manipu¬ 
lation  of  testing  strategy,  KR  administration,  and  pacing  of  item  administration 
showed  no  effect  of  KR  on  ability  estimates  or  on  reported  motivation  of  the 
students.  The  expected  increase  in  ability  estimates  due  to  the  motivating  ef¬ 
fects  of  KR  was  not  found.  One  reason  for  this  might  have  been  due  to  some  in¬ 
teraction  between  variables  that  were  not  experimentally  controlled.  Important 
variables  that  require  particular  attention  in  the  study  of  the  motivational 
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effects  of  KR  may  be  the  abiLity  level  of  the  experimental  subjects  and  the  dif¬ 
ficulty  of  the  test  and,  thus,  the  proportion  of  positive  or  negative  feedback. 

A  study  investigating  the  effects  of  KR  should  be  implemented  under  "motivated" 
conditions  so  that  the  experimental  test  would  count  toward  a  grade  in  a  re¬ 
quired  course.  In  this  way,  the  true  motivating  effect  might  be  better  assessed 
on  students  who  are  maximally  motivated  to  perform  well.  Under  such  conditions 
KR  might  increase  anxiety  to  a  detrimental  degree  and  might  result  in  poorer 
test  performance. 

That  KR  has  been  shown  to  hive  an  effect  on  performance  and  reported  moti¬ 
vation  in  several  earlier  studies  (Betz  &  Weiss,  1976a,  1976b;  Pine,  Church, 
Gialluca,  &  Weiss,  1979;  Prestwi-od  &  Weiss,  1978)  indicates  that  it  is  an  impor¬ 
tant  testing  parameter  meriting  further  investigation.  However,  particular  at¬ 
tention  must  be  paid  to  the  experimental  variables  with  which  it  is  paired  and 
to  the  abiLity,  test  anxiety,  and  motivation  levels  of  the  examinee  groups  that 
would  be  employed  in  the  study. 
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APPENDIX: 


Item  Numbers,  and  Discrii 
Parameter  Estimates  for 


Item  Number 

a 

b 

597 

.624 

-0 

382 

.856 

-.010 

292 

.610 

.012 

205 

.603 

-.024 

207 

.793 

-.035 

104 

.944 

.050 

137 

.499 

-.056 

444 

.621 

.059 

209 

.870 

.067 

145 

.791 

.086 

503 

1 .062 

-.090 

355 

.506 

.104 

365 

.877 

-.105 

176 

.415 

-.106 

380 

1.822 

.115 

154 

.872 

-.124 

218 

.407 

-.125 

234 

.650 

-.132 

161 

1.384 

.132 

56 

1.109 

.135 

270 

1.223 

-.138 

143 

1.036 

-.153 

599 

1.634 

.158 

156 

.841 

-.166 

626 

.917 

.172 

TABLES 
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ion  (a^)  and  Difficulty  (b) 
50-Item  Conventional  Test 


Item  Number 

a 

b 

329 

1 .424 

.177 

208 

.743 

-.17^ 

670 

.872 

.196 

91 

1.132 

-.197 

622 

.444 

.201 

52 

.844 

.205 

661 

.799 

.206 

667 

.719 

-.215 

502 

.730 

.218 

272 

1.960 

.223 

211 

.773 

-.236 

37 

.860 

-.236 

645 

.674 

.242 

224 

.679 

-.257 

390 

.797 

-.257 

327 

.795 

.258 

221 

.822 

-.278 

144 

.910 

.286 

568 

1.627 

.290 

369 

.788 

.295 

318 

.526 

.310 

50 

.694 

-.321 

307 

.699 

.325 

116 

.494 

.334 

128 

1.04 

-.355 
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h»-3.  r  1 1  er  ion- Re  l  it  ed  Validity  of  Adaptive  Testing  Strategies.  June  1980. 

H:-J.  Interactive  >mputer  Administration  of  a  Spatial  Reasoning  Test.  April  1980. 

Kina!  Report:  Compui  ar  ized  Adaptive  Performance  Fvaluatlon.  February  1980. 

H  .  M  t e*  t  s  >r  Immediate  Knowledge  of  Results  on  Achievement  Test  Performance  and  Test 
Dimens  l  »nal  1 1  v  .  latmary  1980. 

79-7.  The  Persoit  Resjninse  Curve:  Fit  of  Individuals  to  Item  Characteristic  Curve  Models.  December 
*.9*9. 

"*  4  -  r . .  *t!lclency  of  tn  Adaptive  later-Suhtest  Branching  Strategy  In  the  Measur  emen  t  of  Classroom 
Achievement.  November  1979. 

‘4-,.  An  Adaptive  Testing  Strategy  foe  Mastery  Decisions.  September  1  9  79. 

"4-«.  w  t  1  ec  t  of  Po int-I n- Time  in  Instruction  on  the  Measurement  of  Achievement .  August  1979. 

4-1.  *<t  1  it  i  >nshlps  imong  Achievement  Level  F.stimates  from  Three  Item  Characteristic  Curve  Scoring 

Methods.  April  197t\ 

’toil  Report:  Bias-Free  Computerized  Testing.  March  1979. 

7 4—  7 .  effects  of  Computer  1  /**d  Adaptive  Testing  on  Black  and  White  Students.  Marcn  1979. 

’4-1.  Computer  Programs  f>r  Scoring  Test  Data  with  Item  Characteristic  Curve  Models.  February  1979. 

?8-5.  An  Item  Bias  Investigation  of  a  Standardized  Aptitude  Test.  December  1978. 

78-4.  A  Construct  Validation  of  Adaptive  Achievement  Testing.  November  1978. 

78-1.  A  Comparison  of  Levels  and  Dimensions  of  Performance  in  Black  and  White  Croups  on  Tests  of 

Vocabulary,  Mathematics,  and  Spatial  Ability.  October  1978.  ' 

78-2.  The  Kffects  of  Knowledge  of  Results  and  Test  Difficulty  on  Ability  Test  Performance  and 
Psychological  Reactions  to  Testing.  Slept  ember  1978. 

78-1.  A  Comparison  of  the  Fairness  of  Adaptive  and  Conventional  Testing  Strategies.  August  1978. 
77-7.  An  Information  Comparison  of  Conventional  and  Adaptive  Tests  in  the  Measurement  of  Classroom 
Achievement.  October  1977. 

7 7-o.  An  Adaptive  Testing  Strategy  for  Achievement  Test  Batteries.  October  1977. 

77-5.  Calibration  of  an  Item  Pool  for  the  Adaptive  Measurement  of  Achievement.  September  1977. 

77-4.  A  Rapid  Item-Search  Procedure  for  Bayesian  Adaptive  Testing.  May  1977. 

77-3.  Accuracy  of  Perceived  Test-Item  Difficulties.  May  L 9 7 7 . 

77-2.  A  Comparison  of  Information  Functions  of  Multiple-Choice  and  Free-Rcsponse  Vocabulary  Items. 
April  1977. 

77-1.  Applications  of  Computerized  Adaptive  Testing.  March  1977. 

Final  Report:  Computerized  Ability  Testing,  1972-1975.  April  1976. 

76-5.  Effects  of  Item  Characteristics  on  Test  Fairness.  December  1976. 

J6-4.  Psychological  F.ffects  of  Immediate  Knowledge  of  Results  and  Adaptive  Ability  Testing.  June 
1976. 

76-3.  Effects  of  Immediate  Knowledge  of  Results  and  Adaptive  Testing  on  Ability  Test  Performance. 
June  1976. 

76-2.  Effects  of  Time  Limits  on  Test-Taking  Behavior.  April  1976. 

76-1.  Some  Properties  of  a  Bayesian  Adaptive  Ability  Testing  Strategy.  March  1976. 

75- 6.  A  Simulation  Study  of  Stradaptive  Ability  Testing.  December  1975. 

75-5.  Computerized  Adaptive  Trait  Measurement:  Problems  and  Prospects.  November  1975. 

75-4.  A  Study  of  Computer-Administered  Stradaptive  Ability  Testing.  October  1975. 

75-1.  Empirical  and  Simulation  Studies  of  Flexllcvel  Ability  Testing.  .July  1975. 

75-2.  TKi’RKST:  A  FORTRAN  IV  Program  for  Calculating  Tetrachoric  Correlations.  March  1975. 

75-1.  An  Empirical  Comparison  of  Two-Stage  and  Pyramidal  Adaptive  Ability  Testing.  February  1975. 

7 4-5.  Strategies  of  Adaptive  Ability  Measurement.  December  1974. 

74—4.  Simulation  Studies  of  Two-Stage  Ability  Testing.  October  1974. 

7 4-3.  An  Empirical  Investigation  of  Computer-Administered  Pyramidal  Ability  Testing.  July  1974. 
74-2.  A  Word  Knowledge  Item  Pool  for  /Adaptive  Ability  Measurement.  June  19  74. 

74-1.  A  Computer  Software  System  for  Adaptive  Ability  Measurement.  January  1974. 

73-4.  An  Empirical  Study  of  Computer-Administered  Two-Stage  Ability  Testing.  October  1973. 

73-3.  The  Stratified  Adaptive  Computerized  Ability  Test.  September  1973. 

71-2.  Comparison  of  Four  Empirical  Item  Scoring  Procedures.  August  1973. 

73-1.  Ability  Measurement:  Conventional  or  Adaptive?  February  1973. 

Copies  of  these  reports  are  available,  while  supplies  last,  from: 

Computerized  Adaptive  Testing  Laboratory 
N660  Elliott  Hall,  University  of  Minnesota 
75  East  River  Road,  Minneapolis,  MN  55455 


